Empirical comparisons of various discretizationprocedures

نویسندگان

Petr Berka

Ivan Bruha

چکیده

The genuine symbolic machine learning (ML) algorithms are capable of processing symbolic, categorial data only. However, real-world problems, e.g. in medicine or nance, involve both symbolic and numerical attributes. Therefore, there is an important issue of ML to discretize (categorize) numerical attributes. There exist quite a few discretization procedures in the ML eld. This paper describes two newer algorithms for categorization (discretization) of numerical attributes. The rst one is implemented in the KEX (Knowledge EXplorer) as its preprocessing procedure. Its idea is to discretize the numerical attributes in such a way that the resulting categorization ts the way how KEX creates a knowledge base. Nevertheless, the resulting categorization is suitable also for other machine learning algorithms. The other discretization procedure is implemented in CN4, a large extension of the well-known CN2 machine learning algorithm. The range of numerical attributes is devided into intervals that may form a complex generated by the algorithm as a part fo the class description. Experimental results show a comparison of performance of KEX and CN4 on some well-known ML databases. To make the comparison more exhibitory, other ML algorithms such as ID3 and C4.5 were run under our experiments, too. Then, the results are compared and discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parametric Empirical Bayes Test and Its Application to Selection of Wavelet Threshold

In this article, we propose a new method for selecting level dependent threshold in wavelet shrinkage using the empirical Bayes framework. We employ both Bayesian and frequentist testing hypothesis instead of point estimation method. The best test yields the best prior and hence the more appropriate wavelet thresholds. The standard model functions are used to illustrate the performance of the p...

متن کامل

Stochastic Comparisons of Probability Distribution Functions with Experimental Data in a Liquid-Liquid Extraction Column for Determination of Drop Size Distributions

The droplet size distribution in the column is usually represented as the average volume to surface area, known as the Sauter mean drop diameter. It is a key variable in the extraction column design. A study of the drop size distribution and Sauter-mean drop diameter for a liquid-liquid extraction column has been presented for a range of operating conditions and three different liquid-liquid sy...

متن کامل

Comparison of Model Selection for Regression

We discuss empirical comparison of analytical methods for model selection. Currently, there is no consensus on the best method for finite-sample estimation problems, even for the simple case of linear estimators. This article presents empirical comparisons between classical statistical methods - Akaike information criterion (AIC) and Bayesian information criterion (BIC) - and the structural ris...

متن کامل

Under Review in Neural Computation , 2002 Comparison of Model Selection for Regression

We discuss empirical comparison of analytical methods for model selection. Currently, there is no consensus on the ‘best’ method for finite-sample estimation problems, even for the simple case of linear estimators. This paper presents empirical comparisons between classical statistical methods (AIC, BIC) and the SRM method (based on VC-theory) for regression problems. Our study is motivated by ...

متن کامل

THE EMPIRICAL BAYES METHOD OF ANALYSIS OF A SERIES OF EXPERIMENTS

The classical method of analysis of a series of experiments is somewhat involved in being conditional on various, occasionally unrealistic, assumptions such as homogeneity of variances of experimental error, lack of interactions of treatments and places,etc. In this work, we adopt a Bayesian view to account for such heterogeneities. Our appoach is illustrated by a real series of experiment...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1995

Empirical comparisons of various discretizationprocedures

نویسندگان

چکیده

منابع مشابه

Parametric Empirical Bayes Test and Its Application to Selection of Wavelet Threshold

Stochastic Comparisons of Probability Distribution Functions with Experimental Data in a Liquid-Liquid Extraction Column for Determination of Drop Size Distributions

Comparison of Model Selection for Regression

Under Review in Neural Computation , 2002 Comparison of Model Selection for Regression

THE EMPIRICAL BAYES METHOD OF ANALYSIS OF A SERIES OF EXPERIMENTS

عنوان ژورنال:

اشتراک گذاری